We are seeking a Software Engineer - AI Training Data to join our team in Palo Alto. This role focuses on addressing complex challenges in data management and is ideal for someone passionate about building innovative systems that enhance AI training capabilities.
Our company is dedicated to revolutionizing the semiconductor industry through advanced technology. By developing super-intelligent systems, we aim to significantly accelerate hardware innovation. We focus on creating cutting-edge models that can deeply understand the complexities of semiconductors and electronics, ultimately transforming how the industry operates.
As a Software Engineer - AI Training Data, your primary responsibility will be to build and optimize the world’s largest semiconductor dataset. You will leverage your software engineering expertise to prepare data for our Machine Learning team and develop scalable systems to manage diverse types of information, including text, images, circuits, and more. Your work will be critical in ensuring that our AI models are trained with high-quality data.
What We Can Offer You:
- Competitive salary range of $150,000 to $350,000, based on experience and potential impact.
- Unlimited PTO to ensure a healthy work-life balance.
- Comprehensive health coverage, including medical and dental benefits.
- Opportunities for professional growth and development through challenging projects.
- Visa sponsorship for international candidates.
Key Responsibilities:
- Build and manage a comprehensive semiconductor dataset.
- Develop software solutions to scrape and handle data at scale.
- Extract and clean information from diverse data modalities, including text, images, circuits, and simulations.
- Prepare and preprocess data for the Machine Learning team.
- Build systems to manage the transfer of customer data and feedback.
- Parse documents in various formats and structures.
- Develop software pipelines for data labelers and manage workloads on large cloud compute clusters.
- Implement systems for pre-processing datasets for AI training.
Relevant Keywords:
In this role, you will utilize data pipelines, PDF parsing, and cloud infrastructure to optimize AI training data. Your expertise will ensure that we maintain high data quality and support the performance of our models effectively.
Required Skill Sets:
- Proven experience in building scalable software solutions for data management.
- Expertise in PDF parsing and data extraction techniques.
- Strong software engineering skills focused on improving data and model performance.
- Experience handling diverse modalities beyond text.
- Ability to develop custom data processing libraries from the ground up.
- Familiarity with state-of-the-art techniques for preparing AI training data.
- Proficiency in organizing and managing data across multiple cloud environments.
Bonus Points:
- Background in Electrical Engineering.
- Experience relating machine learning model behavior to data quality.
- Experience fine-tuning large language models.
- Prior experience at a hyper-growth startup.
- Experience in building systems for training foundation models.
If you are ready to take on the role of Software Engineer - AI Training Data and contribute to cutting-edge advancements in the semiconductor field, we invite you to apply.